Reranking-based Crash Report Deduplication
نویسندگان
چکیده
Software projects collect and deduplicate vastly numerous crash reports from users to fix bugs efficiently. However, most existing automated methods have performance issues during large-scale clustering. We propose a rerankingbased crash report clustering method. Our method is a combination of two earlier methods. By computing similarity used in ReBucket for the crash reports that are highly similar to the query crash report, the method can process reports with throughput equal to that of PartyCrasher. We also introduce an automatically generated dataset for crash report clustering tasks. The evaluation revealed that our method performs at high processing speed while maintaining high accuracy.
منابع مشابه
The Unreasonable Effectiveness of 1 Traditional Information Retrieval in Crash
6 Organizations like Mozilla, Microsoft, and Apple are flooded with thousands of automated crash reports per day. Although crash reports contain valuable information for debugging, there are often too many for developers to examine individually. Therefore, in industry, crash reports are often automatically grouped together in buckets. Ubuntu’s repository contains crashes from hundreds of softwa...
متن کاملDiscriminative Reranking for Semantic Parsing
Semantic parsing is the task of mapping natural language sentences to complete formal meaning representations. The performance of semantic parsing can be potentially improved by using discriminative reranking, which explores arbitrary global features. In this paper, we investigate discriminative reranking upon a baseline semantic parser, SCISSOR, where the composition of meaning representations...
متن کاملA Survey On Visual Search Reranking
Due to the explosive growth of online video data and images , visual search is becoming an important area of research. Most existing approaches used text based image retrieval which is not so efficient. To precisely specify the visual documents, Visual search reranking is used. Visual search reranking is the rearrangement of visual documents based on initial search results or some external know...
متن کاملHTTP-Level Deduplication with HTML5
In this project, we examine HTTP-level duplication. We first report on our initial measurement study, analyzing the amount and types of duplication in the Internet today. We then discuss several opportunities for deduplication: in particular, we implement two versions of a simple server-client architecture that takes advantage of HTML5 client-side storage for value-based caching and deduplication.
متن کاملConsensus in Asynchronous Systems Where Processes Can Crash and Recover
The Consensus problem is now well identified as being one of the most important problems encountered in the design and the construction of fault-tolerant distributed systems. This problem is defined as follows: processes have to reach a common decision, which depends on their inputs, despite failures. We consider the Consensus problem in asynchronous distributed systems augmented with unreliabl...
متن کامل